Training Dependency Parsers with Partial Annotation
نویسندگان
چکیده
Recently, these has been a surge on studying how to obtain partially annotated data for model supervision. However, there still lacks a systematic study on how to train statistical models with partial annotation (PA). Taking dependency parsing as our case study, this paper describes and compares two straightforward approaches for three mainstream dependency parsers. The first approach is previously proposed to directly train a log-linear graph-based parser (LLGPar) with PA based on a forest-based objective. This work for the first time proposes the second approach to directly training a linear graph-based parse (LGPar) and a linear transition-based parser (LTPar) with PA based on the idea of constrained decoding. We conduct extensive experiments on Penn Treebank under three different settings for simulating PA, i.e., random dependencies, most uncertain dependencies, and dependencies with divergent outputs from the three parsers. The results show that LLGPar is most effective in learning from PA and LTPar lags behind the graphbased counterparts by large margin. Moreover, LGPar and LTPar can achieve best performance by using LLGPar to complete PA into full annotation (FA).
منابع مشابه
Combining Active Learning and Partial Annotation for Japanese Dependency Parsing
The machine learning-based approaches that dominate natural language processing research require massive amounts of labeled training data. Active learning has the potential to substantially reduce the human effort needed to prepare this data by allowing annotators to focus on only the most informative training examples. This paper shows how active learning can be used for domain adaptation of d...
متن کاملParse Imputation for Dependency Annotations
Syntactic annotation is a hard task, but it can be made easier by allowing annotators flexibility to leave aspects of a sentence underspecified. Unfortunately, partial annotations are not typically directly usable for training parsers. We describe a method for imputing missing dependencies from sentences that have been partially annotated using the Graph Fragment Language, such that a standard ...
متن کاملData-Driven Dependency Parsing of New Languages Using Incomplete and Noisy Training Data
We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree...
متن کاملCombining Active Learning and Partial Annotation for Domain Adaptation of a Japanese Dependency Parser
The machine learning-based approaches that dominate natural language processing research require massive amounts of labeled training data. Active learning has the potential to substantially reduce the human effort needed to prepare this data by allowing annotators to focus on only the most informative training examples. This paper shows that active learning can be used for domain adaptation of ...
متن کاملTraining Parsers on Partial Trees: A Cross-language Comparison
We present a study that compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity. We show how the partial dependency trees projected from English to Dutch, Italian and German can be exploited to train parsers for the target languages. We evaluate the parsers against manual gold standard annotations and find that t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1609.09247 شماره
صفحات -
تاریخ انتشار 2016